home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Network Support Library
/
RoseWare - Network Support Library.iso
/
pressgen
/
hsm.txt
< prev
next >
Wrap
Text File
|
1994-01-27
|
17KB
|
383 lines
Introduction
Originally used to reduce storage and administration costs in mainframe and
mid-sized systems, Hierarchical Storage Management (HSM) is now helping manage
data on LANs. The theory behind HSM is the same for mainframes and LANs alike,
primary on-line storage is costly if not managed efficiently.
In LAN environments, the burden of storage management is heightened as it
effects not only the network managers but nearly every user as well. Time spent
dealing with the capacity of server and workstation volumes can add up to a
loss far greater than the price of the disks themselves. By completely
automating the management of disk capacity, HSM can deliver the benefits of
near-infinite storage while minimizing cost, labor, and overhead.
This White Paper illustrates how HSM provides the best solution to the
never-ending need to increase LAN storage. Refer to the glossary on the last
page of this White Paper for a description of the terms used herein.
HSM: The first effective solution to the LAN storage dilemma
Disk volumes on LAN servers always fill up - it's almost a law of physics.
Until recently, there have been only three ways to deal with the problem when
it happens:
Approach #1
Buy more disk. This solution does solve the problem, but at a price. Disk
storage on a server typically costs between $1,000 and $2,000 per gigabyte and
twice that for mirroring, making continuing disk expansion rather expensive.
But the disk drive is only the beginning, you also need controllers, cables,
power, and physical space. Furthermore, as the on-line capacity grows, more RAM
is needed for the file server to service disk cache, more disk connections are
used in both software and hardware, and more and more backup systems are needed
to protect the additional hard drive volumes of data.
Approach #2
Have the users remove "clutter." When buying more disks isn't a practical
option, and the network manager doesn't have the time or appropriate tools to
manage data, this is the only remaining option. While its apparent costs are
zero (no disk or special software to purchase), this option is generally the
least reliable and most expensive. Manual deletion of files is time consuming
and inherently risky. Users may inadvertently delete files which are valuable,
or they may remove so few files as to make the exercise pointless. It may take
days for an administrator to convince users to clean up their act,' only to
find they have also removed applications, important data files or application
configuration files. While cooperation may have yielded some additional disk
space, more time must be spent putting the system back in shape.
Approach #3
Migrate the files to off-line storage. This approach involves making extra
off-line copies of inactive files, and then removing them from the on-line
storage. This does allow space to be reclaimed in a reasonable time period, but
has the drawback that user access to the migrated data is inconvenient at best.
In most cases the users must request specific files from the system
administrator and then wait for the data to be found in the off-line archives
and then hopefully be restored. Often, users will not know the specific name or
location of migrated data, and a more imprecise and often fruitless task of
finding the data on off-line media must begin. This process may take hours or
even days. And as the amount of migrated data grows, the recall requests can
become unmanageable.
The HSM Alternative
HSM (Hierarchical Storage Management) is an automated process that provides
the best solution to the never ending demand for increases in LAN storage. By
moving inactive data to less expensive secondary storage, and recalling it
automatically when needed, users receive the benefits of near-infinite disk
storage without the cost.
Figure 1. Storage Tiers
Compared to approach #1 (buying more disk), HSM provides the same 100%
accessibility at a much lower cost. Compared to approach #3 (migrating old
data), HSM provides the same cost savings without the administrative burden or
inconvenience to the user. Compared to approach #2 (have users remove clutter),
HSM allows users to go on doing productive work and handles the storage
management tasks automatically and safely. Most HSM systems perform the three
following basic functions:
Pre-staging of inactive data to secondary storage. Some form of secondary
storage is used. Typically, this takes the form of a tape autoloader, optical
jukebox, a device such as a large tape drive or some combination of these
devices. A key characteristic of the secondary storage is a cost per gigabyte
much lower than that of magnetic disk (primary storage). The HSM system is
responsible for moving copies of inactive files into the secondary storage, in
anticipation of the need to remove them from primary storage as the disk
volumes fill. (Data is pre-staged to avoid the performance burden of
transferring large amounts of data during peak usage hours.)
Monitoring of primary storage volumes with migration as needed. Throughout
the day and night, the HSM system monitors the volumes it services. If the
amount of data on the volume exceeds the configurable "high water mark"
(usually expressed as a percentage of total disk space), migration will occur.
If pre-staging has been used, this migration requires nothing more than
selecting the right files to migrate, shortening them to a very small "phantom"
or "stub" file as a place holder, and stopping the migration when the specified
"low water mark" is reached.
Automatic recall of files as needed. The users continue to see all data as
on-line. When a file is accessed, a recall agent, resident in the client or in
the server, initiates the process of moving the file from secondary storage
back to primary storage.
These three processes combined allow for rapid LAN storage growth without the
expense of on-line disk or the administrative overhead of manual storage
management. As the disks fill, inactive files are moved off to less expensive
storage, but remain accessible to users without administrator intervention.
Why is HSM for LANs such a visible topic now?
Several independent trends have converged to make HSM attractive for LANs.
Size of LAN storage
LAN data and storage continues to grow exponentially. Even with the decreasing
costs of magnetic disks, keeping up with the storage growth is an expensive
proposition. Beyond the costs of the disks themselves, issues with server
configuration (RAM, SCSI ports, slots) and the time and cost involved in
management of larger storage systems are creating a greater need for an HSM
solution.
Figure 2. Typical LAN Capacity Profile
Availability of robotic storage devices
The last two years have seen an increase in availability and a decrease in cost
of high-capacity autoloaders and jukeboxes. These robotic devices have matured
in terms of both dependability and programmability, making them more reponsive
to sophisticated software control. The ready availability of low-cost secondary
storage is a prerequisite to HSM; the enhanced capabilities for these devices
make them ideal storage management components.
Availability of HSM software
While file migration software (with manual restores) has been around for some
time, truly automated HSM software has just recently been introduced to NetWare
LANs in 1993. The automation of a true HSM system greatly improves usability of
secondary storage.
HSM is not a new concept. HSM systems, both automated and semi-automated, have
served mainframe and minicomputer platforms for years. As LANs grow in
complexity and storage requirements to the level of mainframe systems, the need
for HSM becomes more apparent.
The Palindrome HSM Software Approach: Integrating Backup, Archiving, and HSM
Palindrome HSM Software operates in conjunction with Palindrome's award-winning
Network Archivist backup and archiving software. Palindrome HSM Software builds
on the intelligent storage management architecture of Network Archivist to
provide a flexible, scalable HSM solution applicable to a wide range of LAN
environments. In addition, the tight integration of Palindrome HSM Software
with Network Archivist provides a total storage management environment, capable
of managing the entire backup, archiving, and HSM functions in one seamless,
robust, and easy-to-administer package. This integration provides superior
reliability, it also allows surprisingly affordable implementation. Further,
the modularity of the system approach also allows LAN sites to customize and
configure their storage management environment to fit their individual needs
and to grow as LAN size increases.
Figure 3. Palindrome HSM Software Architecture
Palindrome HSM Software is composed of Palindrome HSM Volume Monitor and
Palindrome Client Recall Agents.
Palindrome HSM Volume Monitor
Residing as an NLM on NetWare servers, the Palindrome Volume Monitor checks the
disk capacity of the servers under its protection at configurable intervals to
determine how full the storage devices have become. When any disk exceeds its
high-water mark - a level which the administrator has decided would be too full -
the Palindrome system automatically converts data on the disk (which is
eligible for migration and already pre-staged onto secondary media) to
zero-byte phantom files. This migration of pre-staged data continues until the
disk reaches its low water mark - a percentage of disk which is acceptably full
and allows ample room for the server to continue running effectively. The
monitoring capability of Palindrome HSM Software is extremely powerful and
flexible. High and low water marks can be set individually for each server
volume, permitting customized storage management according to a volume's
capacity, storage requirements and disk activity. And prior to migration, files
are permanently archived to ensure ability to restore.
The HSM migration process leaves behind zero-byte phantom files in the process
of removing eligible files from storage - the file's original file name remains
on the server, but the file itself has been removed. This phantom file is key
to recalling the data at a later time. The order in which data is migration can
also be customized in a number of ways:
Least recently used. If the monitor is set for this option, the least
recently used data is migrated first until the low water mark is reached. Files
that have not been accessed the longest are less likely to be needed
immediately and re good candidates for moving off primary storage. This option
is easiest to understand, and to explain, and is the default option in
Palindrome HSM. This option, however, may not be the best for all computing
environments.
Largest first. If this option is chosen, all files that are eligible for
migration are sorted by size and deleted from storage on that priority until
the low water mark is reached. In any environment, regardless of applications
used, this option always migrates the fewest possible files to attain its
migration goals and therefore reduces any future restore requests. Fewer
migrations and fewer restores result in increased system performance.
Most Eligible. Depending on how an administrator sets the rules for
migrating files, Palindrome HSM software can determine which files are most
eligible to be migrated. For example, files that have a "quick" migrate date -
say a few weeks - would be more eligible than files with 52-week migration
eligibility dates. The system can be set to prioritize migrations in just such
a manner. Those files which are "more eligible" are the first removed (i.e.
their migration dates far exceed their migration rule).
Pre-staged Files and Eligible for Migration'
It's important to note that one of Palindrome HSM's strengths comes from its
tight integration with the Network Archivist software. With file protection
rules set in the backup and archiving software, files are redundantly protected
across more than one piece of secondary storage, insuring that these files will
remain protected even if one tape or optical disk is damaged, or a complete
site disaster occurs. Once files are protected redundantly and reach the
administrator's criteria for migration (not accessed for 12 weeks, for
example), these eligible files can be simply converted to phantom files on the
server as they've already been pre-staged to secondary media through the
Network Archivist system of archiving. This instant' migration of eligible
files from the server is an important advantage - no network traffic is
generated moving files to secondary storage, and no separate migration
operations need ever be performed.
Palindrome Client Recall Agents
Whether an attached workstation on the network is running DOS, Windows, or
OS/2, the Palindrome Recall Agent is laded into the machine's memory at
start-up. These agents run in the background waiting for the client's access of
phantom files - zero-byte files left behind as place markers for migrated files
(see figure 3, page 4). Users can access these files even within applications,
calling up a migrated text file, for example, from within a word processor.
When the filename is accessed, the recall agent steps in. First, the user is
notified via a pop-up window that the file has been migrated and automatically
queues it for restore. The recall agent submits the filename into the
Palindrome queue where it is referenced in the Palindrome File History Database
for immediate retrieval.
User Notification and Control
No HSM system can be successful if it causes confusion or frustration for the
users. When a file recall is in progress, a pop-up informs the user, showing
the file name, queue status, and an activity indicator. The user has three
options, to do nothing and let the recall complete, to continue without waiting
(useful in recalling groups of files), or to delete the recall request.
If installed, Palindrome's File Manager can allow users to initiate the recall
of a whole project, rather than requesting the files sequentially via the
recall agent.
Scalable HSM
Because Palindrome delivers an integrated approach to a storage management
environment, implementing HSM on a LAN can be done gradually, as storage
requirements demand and budget allows. A business can purchase Network
Archivist for backup and archiving to tape, customize the Archivist rules over
a period of time to get the optimal protection for their environment, and then
move into HSM in their own way.
Many LANs can deploy basic HSM functionality with a single tape drive or pair
of drives, for example. Due to Network Archivist's flexible media handling
capabilities, one tape drive can be designated to handle incremental backup
data while another tape drive is set to handle near-line storage. This larger
tape can remain permanently in the drive to service demigration requests,
adding 4-10 gigabytes of storage at low cost.
As LAN primary disk and migrated data demands grow, the administrator can add
an optical disc, a tape autoloader, and optical jukebox or any combination -
without re-implementation or lengthy configuration - to extend the reach and
depth of the HSM services.
Conclusion
Palindrome is the LAN industry leader in automated storage management. Plans
and accommodations for the current HSM products were a part of Network
Archivist's growth path since the shipment of Network Archivist in 1989. The
thoughtful integration of HSM into the current product line is a natural
evolution of the Palindrome storage management philosophy. The power,
flexibility, and economy of the Palindrome HSM system make it the most
responsive to the local area network environment, now and in the future.
Glossary
On-Line: hard disk drive (often referred to as primary storage).
Near-line: mechanically available data, usually stored on tapes or optical
media; in an HSM system, end-users have no need to know whether a file resides
on primary or near-line storage (often referred to as secondary storage).
Off-line: media held in a vault or on a shelf and not immediately available to
the storage management software.
Pre-stage: to make copies of data eligible for migration onto secondary media.
Once the data is protected redundantly, on multiple media, it can simply be
removed from primary storage, as it has already been "moved" to secondary
through pre-staging.
Migrate: to move data from one storage media to another, usually lower in the
hierarchy.
Phantom file: a filename, occupying zero bytes, which stands as a placeholder
for data which has been migrated from primary storage.
Recall: opposite of migrate; bring back to primary storage.